3527: Avro-4193: [c] segfault cannot handle empty byte array#14
3527: Avro-4193: [c] segfault cannot handle empty byte array#14martin-augment wants to merge 2 commits intomainfrom
Conversation
When avro-c tries to encode an empty byte array it will avro_malloc(0) which on some architectures will return NULL. Make sure this is not interpreted as an error or or dereferenced causing a segfault.
|
Caution Review failedFailed to post review comments WalkthroughThis pull request extends Avro's language bindings with logical type support (decimal, duration, uuid, date, time, timestamp), strengthens PHP type safety via comprehensive type hints, updates C library dependencies, and refines binary encoding to omit NUL terminators from JSON strings. Changes
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Pull Request ReviewThis PR merges three separate commits addressing bug fixes in C and feature additions in PHP:
Overall AssessmentThe changes are generally well-implemented with appropriate test coverage. However, there are some concerns and suggestions for improvement. C Code ChangesPositive Aspects
ConcernsCRITICAL - Memory Safety Issue in uint8_t *dest8 = (uint8_t *) avro_malloc(utf8_len);
if (dest8 == NULL && utf8_len) {The condition uint8_t *curr = dest8;
for (i = 0; i < src_len; i++) {
if (src8[i] & 0x80) {
*curr++ = (0xc0 | (src8[i] >> 6)); // Could crash if dest8/curr is NULL
*curr++ = (0x80 | (src8[i] & 0x3f));Recommendation: Initialize Minor - Test allocator fix ( PHP Code ChangesPositive Aspects
ConcernsMedium - Decimal precision validation ( $maxValue = 10 ** $precision;
if (abs($value) >= $maxValue) {This validation is incorrect. For decimal with precision=5, the max value should be 99999 (5 nines), but Low - Inconsistent error handling ( if (!is_numeric($decimal)) {
throw new AvroException("Decimal value '{$decimal}' must be numeric");
}Consider using a more specific exception type if available, or at least document the expected exception hierarchy. Low - Potential rounding issues ( $value = $decimal * (10 ** $scale);
if (!is_int($value)) {
$value = (int) round($value);
}Using Low - Magic number ( if (strlen($encodedDuration) !== 12) {Consider defining a constant like Code style - Mixed return types ( public function writeNull($datum): void
{
return; // Unnecessary explicit return in void function
}While not wrong, explicit Workflow & Configuration ChangesPositive
ConcernsLow - Composer version pinning: Test CoverageStrengths
Gaps
Security Concerns
Performance Considerations
RecommendationsMust Fix
Should Fix
Consider
ConclusionThis PR addresses real bugs and adds valuable functionality. The C fixes for null byte handling and empty arrays are critical for correctness. The PHP logical types support is well-implemented with good test coverage. However, the potential null pointer dereference in the C code for zero-length arrays should be addressed before merging. The PHP decimal validation issue is less critical but should also be fixed. Overall: Request Changes - Fix the critical C issue, then this will be ready to merge. |
| } | ||
|
|
||
| json_t *result = json_string_nocheck((const char *) encoded); | ||
| json_t *result = json_stringn_nocheck((const char *) encoded ? encoded : "", encoded_size); |
There was a problem hiding this comment.
This conditional cast applies only to the condition; as written, (const char *) encoded ? encoded : "" passes a pointer of incompatible type and may fail to compile. Consider casting the entire conditional result so the argument to json_stringn_nocheck is a const char * in both branches.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
| $scale = $logicalType->attributes()['scale'] ?? 0; | ||
| $precision = $logicalType->attributes()['precision'] ?? null; | ||
|
|
||
| $encoder->writeDecimal($datum, $scale, $precision); |
There was a problem hiding this comment.
For fixed decimals, Avro requires fixed-size two’s complement bytes without a length prefix; calling writeDecimal here writes a length-prefixed byte array. Consider writing exactly $writers_schema->size() bytes for fixed decimal (also applies to reading at AvroIODatumReader::readFixed where readBytes() is used at line 353).
🤖 React with 👍 or 👎 to let us know if the comment was useful.
| ); | ||
| } | ||
|
|
||
| $packed = pack('J', $value); |
There was a problem hiding this comment.
pack('J', $value) uses machine endianness; Avro decimal logical types (bytes/fixed) must use big-endian two’s complement. On little-endian platforms this will emit the wrong byte order, so consider producing big-endian bytes (or reversing before trimming) to ensure interoperable encoding.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
|
closed this PR because the |
Code Review: AVRO-4193 - Support for Empty Byte ArraysSummaryThis PR fixes a critical bug where encoding empty byte arrays causes segfaults due to Code Quality & Best Practices ✅Strengths:
Potential Issues 🔍Critical: Potential UB with memcpy (HIGH PRIORITY)Location: char *bytes_copy = (char *) avro_malloc(size);
if (!bytes_copy && size) {
// error handling
}
memcpy(bytes_copy, bytes, size); // ⚠️ ISSUE HEREProblem: When Recommendation: Add a guard before memcpy: if (size > 0) {
memcpy(bytes_copy, bytes, size);
}This affects:
Medium: Test allocator logic improvementLocation: The fix adds a NULL check before dereferencing ptr, which is good. However, the test allocator should document the behavior that NULL is a valid return for size=0 allocations. Current code: if (nsize == 0) {
if (ptr) {
// free logic
}
return NULL;
}This correctly handles free(NULL), which is good defensive programming. ✅ Minor: Schema decref duplicationLocation: avro_schema_decref(writer_schema); // Line 230
// ... more code ...
avro_schema_decref(writer_schema); // Line 243 - duplicate?The schema is decref'd twice in the test. This appears to be correct since there are two separate test sections, but verify the reference counting is balanced. Performance Considerations ⚡
Security Concerns 🔒Positive Changes:
Remaining Concerns:
Test Coverage 📊Excellent Coverage ✅:
Suggestions:
RecommendationsMust Fix:
Should Consider:
Nice to Have:
VerdictApprove with changes requested ✅ The fix correctly addresses the segfault issue and includes good test coverage. However, the memcpy undefined behavior should be fixed before merging to ensure full C standard compliance. Great work on identifying and fixing this edge case! The test coverage demonstrates thorough thinking about the problem. |
|
Findings
Let me know if you’d like help patching these spots or adding regression coverage. |
3527: To review by AI